Search CORE

51 research outputs found

Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

Author: Amit K. Roy-Chowdhury
Walid A. Najjar
Xiaoyin Ma
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

RECONFIGURABLE COMPUTING Introduction

Author: Ienne Paolo
Najjar Walid A
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

eScholarship - University of California

RECONFIGURABLE COMPUTING Introduction

Author: Ienne Paolo
Najjar Walid A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Infoscience - École polytechnique fédérale de Lausanne

eScholarship - University of California

Compiler Generated Systolic Arrays For Wavefront Algorithm Acceleration on FPGAs

Author: Betul Buyukkurt
Walid A. Najjar
Publication venue
Publication date: 01/01/2008
Field of study

Wavefront algorithms, such as the Smith-Waterman algorithm, are commonly used in bioinformatics for exact local and global sequence alignment. These algorithms are highly computationally intensive and are therefore excellent candidates for FPGA-based code acceleration. However, there is no standard form of these algorithms, they are used in a wide variety of situations with various constraints. It is therefore not practical to have a standard kernel that can be mapped to an FPGA, hence the importance of being able to compile such codes from a high level language. ROCCC is a C to VHDL compiler, which optimizes and parallelizes the most frequently executed kernel loops in applications such as in multimedia, scientific and high-performance computing. In this paper we describe the transformations performed by ROCCC, which transformed the kernel of the Smith-Waterman algorithm into a hardware systolic array that is mapped onto the FPGA on the SGI Altix RASC blade. We report a throughput increase by over 3,000X over a 2.8 GHz Xeon. 1

CiteSeerX

Crossref

Fast Area Estimation to support Compiler Optimizations in FPGA-based Reconfigurable Systems

Author: Dhananjay Kulkarni
Walid A. Najjar
Publication venue: IEEE Press
Publication date: 01/01/2002
Field of study

Several projects have developed compiler tools that translate high-level languages down to hardware description languages for mapping onto FPGAbased reconfigurable computers. These compiler tools can apply extensive transformations that exploit the parallelism inherent in the computations. However, the transformations can have a major impact on the chip area (number of logic blocks) used on the FPGA. It is imperative therefore that the compiler user be provided with feedback indicating how much space is being used. In this paper we present a fast compile-time area estimation technique to guide the compiler optimizations. Experimental results show that our technique achieves an accuracy within 2.5 % for small image-processing operators, and within 5.0% for larger benchmarks, as compared to the usual post-compilation synthesis tool estimations. The estimation time is in the order of milliseconds as compared to several minutes for a synthesis tool. 1

CiteSeerX

Experimental Evaluation of Blocking and Non-Blocking Multithreaded Code Execution

Author: Lucas Roh
Lucas Roh
Murali Annavaram
Murali Annavaram
Walid A. Najjar
Walid A. Najjar
Publication venue
Publication date
Field of study

The objective of multithreaded execution models is masking the latency of inter processor communications and remote memory accesses in large-scale multiprocessors. Several such models combine aspects of dataflow-like execution with the von Neumann model in an attempt to provide both efficient synchronization (as in the dataflow model) and efficient exploitation of program locality (as in the von Neumann model). We refer to these models as data-driven multithreading models. One of the factors that distinguishes these models is the thread execution strategy: A thread can be either non-blocking or blocking. Another factor is the architectural support for dynamic synchronization: The locality present within and among threads can potentially be exploited by a proper storage hierarchy for synchronization store (operand storage). Two storage models have been proposed for data-driven multithreaded execution. One is frame based, in which all the threads belonging to a code-block share one stora..

CiteSeerX

Reconfigurable Computing

Author: Paolo Ienne
Walid A. Najjar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref